Improved KNN Imputation for Missing Values in Gene Expression Data
نویسندگان
چکیده
The problem of missing values has long been studied by researchers working in areas data science and bioinformatics, especially the analysis gene expression that facilitates an early detection cancer. Many attempts show improvements made excluding samples with information from process, while others have tried to fill gaps possible values. While former is simple, latter safeguards loss. For that, a neighbour-based (KNN) approach proven more effective than other global estimators. paper extends this further introducing new summarization method KNN model. It first study applies concept ordered weighted averaging (OWA) operator such context. In particular, two variations OWA aggregation are proposed evaluated against their baseline neighbor-based models. Using different ratios 1%–20% set six published datasets, experimental results suggest methods usually provide accurate estimates those compared methods. Specific rates 5% 20%, best NRMSE scores as averages across datasets 0.65 0.69, highest measures obtained existing techniques included 0.80 0.84, respectively.
منابع مشابه
Missing data imputation in multivariable time series data
Multivariate time series data are found in a variety of fields such as bioinformatics, biology, genetics, astronomy, geography and finance. Many time series datasets contain missing data. Multivariate time series missing data imputation is a challenging topic and needs to be carefully considered before learning or predicting time series. Frequent researches have been done on the use of diffe...
متن کاملMissing value imputation for gene expression data: computational techniques to recover missing data from available information
Microarray gene expression data generally suffers from missing value problem due to a variety of experimental reasons. Since the missing data points can adversely affect downstream analysis, many algorithms have been proposed to impute missing values. In this survey, we provide a comprehensive review of existing missing value imputation algorithms, focusing on their underlying algorithmic techn...
متن کاملMissing Values with iterative imputation
In this paper, the author designs an efficient method for imputing iteratively missing target values with semiparametric kernel regression imputation, known as the semi-parametric iterative imputation algorithm (SIIA). While there is little prior knowledge on the datasets, the proposed iterative imputation method, which impute each missing value several times until the algorithms converges in e...
متن کاملRegression imputation of missing values in longitudinal data sets.
A stand-alone, menu-driven PC program, written in GAUSS, which can be used to estimate missing observations in longitudinal data sets is described and male available to interested readers. The program is limited to the situation in which we have complete data on N cases at each of the planned times of measurement t1, t2,..., tT; and we wish to use this information, together with the non-missing...
متن کاملCLIMP - Cluster-based Imputation of Missing Values in Microarray Data
Since their invention in the mid-1990s many of improvements have been achieved concerning the quality of microarrays. Different kinds of microarrays are in use today in many fields, which has led to a vast number of preprocessing and analysis techniques for data from such microarrays. Due to their complexity and high sensitivity to all different kinds of influences during manufacturing and expe...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computers, materials & continua
سال: 2022
ISSN: ['1546-2218', '1546-2226']
DOI: https://doi.org/10.32604/cmc.2022.020261